Version 3 of the Global Aridity Index and Potential Evapotranspiration Database

The “Global Aridity Index and Potential Evapotranspiration Database - Version 3” (Global-AI_PET_v3) provides high-resolution (30 arc-seconds) global hydro-climatic data averaged (1970–2000) monthly and yearly, based upon the FAO Penman-Monteith Reference Evapotranspiration (ET0) equation. An overview of the methods used to implement the Penman-Monteith equation geospatially and a technical evaluation of the results is provided. Results were compared for technical validation with weather station data from the FAO “CLIMWAT 2.0 for CROPWAT” (ET0: r2 = 0.85; AI: r2 = 0.90) and the U.K. “Climate Research Unit: Time Series v 4.04” (ET0: r2 = 0.89; AI: r2 = 0.83), while showing significant differences to an earlier version of the database. The current version of the Global-AI_PET_v3 supersedes previous versions, showing a higher correlation to real world weather station data. Developed using the generally agreed upon standard methodology for estimation of reference ET0, this database and notably, the accompanying source code, provide a robust tool for a variety of scientific applications in an era of rapidly changing climatic conditions.


Methods
Calculating Potential Evapotranspiration using Penman-Monteith. Among several equations used to estimate PET, an implementation of the Penman-Monteith equation originally presented by the Food and Agriculture Organization FAO-56 1 , is considered a standard method 3,12,13,49 . FAO-56 1 defined PET as the ET of a reference crop (ET 0 ) under optimal conditions, in this case with the specific characteristics of well-watered grass with an assumed height of 12 centimeters, a fixed surface resistance of 70 seconds per meter and an albedo of 0.23 1 . Less specifically, "reference evapotranspiration", generally referred to as "ET 0 ", measures the rate at which readily available soil water is evaporated from specified vegetated surfaces 2,13 , i.e., from a uniform surface of dense, actively growing vegetation having specified height and surface resistance, not short of soil water, and representing an expanse of at least 100 m of the same or similar vegetations 1,13 . ET 0 is one of the essential hydrological variables used in many research efforts, such as study of the hydrologic water balance, crop yield simulation, irrigation system management and in water resources management, allowing researchers and practitioners to study the evaporative demand of the atmosphere independent of crop type, crop development and management practices 2,4,13,49 . ET 0 values measured or calculated at different locations or in different seasons are comparable as they refer to the ET from the same reference surface. The factors affecting ET 0 are climatic parameters, and crop specific resistances coefficients solved for reference vegetation. Other crop specific coefficients (K c ) may then be used to determine the ET of specific crops (ET c ), and which can in turn be determined from ET 0 1 . As the Penman-Monteith methodology is predominately a climatic approach, it can be applied globally as it does not require estimations of additional site-specific parameters. However, a major drawback of the Penman-Monteith method is its relatively high need for specific data for a variety of parameters (i.e., windspeed, relative humidity, solar radiation). Zomer et al. 18 compared five methods of calculating PET with parameters from data available at the time and settled upon using a Modified Hargreaves-Thornton equation 50 which required less parametrization to produce the Global-AI_PET_v1 [16][17][18] . Several other attempts to produce global PET datasets with concurrently available global datasets came to similar conclusions [51][52][53] . The Modified Hargreaves-Thornton method required less parameterization with relatively good results, relying on datasets which were available at the time for a globally applicable modeling effort. The Global-AI_PET_v1 used the WorldClim_v1.4 20 downscaled climate dataset (30 arcseconds; averaged over the period 1960-1990) for input into the global geospatial implementation of the Modified Hargreaves-Thornton equation, applied on a per grid cell basis at approximately 1 km resolution (30 arcseconds). More recently, the UK Climate Research Unit released the "CRU_TS Version 4.04", which now includes a Penman-Monteith calculated PET (ET 0 ) global coverage, however at a relatively coarse resolution of 0.5 × 0.5 degrees. A number of satellite-based remote sensing datasets 22,[54][55][56][57] are now available and in use to provide the parameters for ET 0 estimates, in some cases providing high spatial and/or temporal resolution and are likely to become increasingly utilized as the historical data record lengthens and sensors improve.
The latest 2.0 versions of WorldClim 58 (currently version 2.1; released January 2020), in addition to being updated with improved data and analysis, and a revised baseline , includes several additional primary climatic variables, beyond temperature and precipitation, namely: solar radiation, wind speed and water vapor pressure. The addition of these variables allowed that the global data now available was sufficient to effectively parameterize the FAO-56 equation to estimate ET 0 globally at the 30 arc seconds scale (~1 km at equator).
The FAO-56 Penman-Monteith equation, described in detail below, has been implemented on a per grid cell basis at 30 arc seconds resolution, using the Python programming language (version 3.2). The data to parametrize the various components equations required to arrive at the ET 0 estimate were obtained from the Worlclim 2.1 58 climatological dataset, which provides values averaged over the time period 1970-2000 for minimum, maximum and average temperature; solar radiation; wind speed, and water vapor pressure. Subroutines in the program include calculation of the psychrometric constant (aerodynamic resistance), saturation vapor pressure, vapor pressure deficit, slope of vapour pressure curve, air density at constant pressure, net shortwave radiation at crop surface, clear-sky solar radiation, net longwave radiation at crop surface, net radiation at the www.nature.com/scientificdata www.nature.com/scientificdata/ crop surface, and the calculation of daily and monthly ET 0 . This process is described below. Geospatial processing and analysis were done using ArcGIS Pro v 2.9 (ESRI, 2020), Python (ArcPy) programming language (version 3.2), and Microsoft Excel for further data analysis, graphics and presentation.
Global Reference Evapotranspiration (Global-ET 0 ). Penman 59 , in 1948 , first combined the radiative   energy balance with the aerodynamic mass transfer method and derived an equation to compute evaporation  from an open water surface from standard climatological records of sunshine, temperature, humidity and wind speed. This combined approach eliminated the need for the parameter "most difficult" to measure, surface temperature, and allowed for the first time an opportunity to make theoretical estimates of ET from standard meteorological data. Consequently, these estimates could also now be made retrospectively. This so-called combination method was further developed by many researchers and extended to cropped surfaces by introducing resistance factors. Among the various derivations of the Penman equation is the inclusion of a bulk surface resistance term 60  The actual vapour pressure [e a , KPa] is the vapour pressure exerted by the water in the air and is usually calculated as function of Relative Humidity [RH]. Water vapour pressure is already available as one of the Worldclim 2.1 variables.
The vapour pressure deficit (e s -e a ), [KPa] is the difference between the saturation (es) and actual vapour pressure (e a ).

Slope of Saturation Vapor Pressure (Δ).
Where T avg [C°] is the average temperature.

Net Radiation At The Crop Surface (R n ).
Net radiation [R n , MJ m −2 day −1 ] is the difference between the net shortwave radiation [Rns, MJ m −2 day −1 ] and the net longwave radiation [R nl , MJ m −2 day −1 ], and is calculated using solar radiation (R s ). In Worldclim 2.1 solar radiation (R s ) is given as KJ m −2 day −1 . Thus, for computation of ET 0 , its unit should be converted to MJ m −2 day −1 and thus its value should be divided by 1000. The net accounting of either longwave and shortwave radiation sums up the incoming and outgoing components.
The net shortwave radiation [R ns , MJ m −2 day −1 ] is the fraction of the solar radiation R s that is not reflected from the surface. The fraction of the solar radiation reflected by the surface is known as the albedo [α]. For the green grass reference crop, α is assumed to have a value of 0.23. The value of Rns is: The difference between outgoing and incoming longwave radiation is called the net longwave radiation [R nl ]. As the outgoing longwave radiation is almost always greater than the incoming longwave radiation, R nl represents an energy loss. Longwave energy emission is related to surface temperature following Stefan-Boltzmann law. Thus, longwave radiation emission is calculated as positive in the outward direction, while shortwave radiation is positive in the downward direction. The net energy flux leaving the earth's surface is influenced as well by humidity and cloudiness Where σ represent the Stefan-Boltzmann constant (4.903 10-9 MJ K −4 m −2 day −1 ), T max,K and T min,K the maximum and minimum absolute temperature (in Kelvin; K = C° + 273. 16), e a is the actual vapour pressure; R s the measured solar radiation [MJ m −2 day −1 ] and R so is the calculated clear-sky radiation [MJ m −2 day −1 ]. R so is calculated as function of extraterrestrial solar radiation [R a , MJ m −2 day −1 ] and elevation (elev, m): The extraterrestrial radiation, [R a , MJ m −2 day −1 ], is estimated from the solar constant, solar declination and day of the year. It requires specific information about latitude and Julian day to accomplish a trigonometric computation of the amount of solar radiation reaching the top of the atmosphere following trigonometric computations as shown in Allen et al. 1 .
Although the soil heat flux is small compared to R n , particularly when the surface is covered by vegetation, changes of soil heat flux may still be relevant at monthly scale. However, accurate assessments of soil heat flux may require computation of soil heat capacity, related to its mineral composition and water content, which in turn may be rather inaccurate at global scale at resolution of 30 arc sec. Thus, for simplicity, changes in soil heat fluxes are ignored (G = 0). www.nature.com/scientificdata www.nature.com/scientificdata/ Bulk Surface Resistance (r s ). The resistance nomenclature distinguishes between aerodynamic resistance and surface resistance factors. The surface resistance parameters are often combined into one parameter, the 'bulk' surface resistance parameter which operates in series with the aerodynamic resistance. The surface resistance, r s , describes the resistance of vapour flow through stomata openings, total leaf area and soil surface. The aerodynamic resistance, ra, describes the resistance from the vegetation upward and involves friction from air flowing over vegetative surfaces. Although the exchange process in a vegetation layer is too complex to be fully described by the two resistance factors, good correlations can be obtained between measured and calculated evapotranspiration rates, especially for a uniform grass reference surface.
A general equation for the bulk surface resistance (r s , [s m −1 ]) describes a ratio between the bulk stomatal resistance of a well illuminated leaf (r l ) and the active sunlit leaf area of the vegetation: s l active = The stomatal resistance of a single leaf under well-watered conditions has a value of about 100 s m −1 . It can be assumed that about half (0.5) of the total LAI is actively contributing to vapour transfer, while it can also be roughly generalized that for short crops there is a linear relation between LAI and crop height (h): When the evapotranspiration simulated with the Penman-Monteith method is referred to a specific reference crop, denoted as ET 0 , a simplified computation of the method can occur that defines a priori specific variables into constant values. In this case, the reference surface is a hypothetical grass reference crop, well-watered grass of uniform height, actively growing and completely shading the ground, with an assumed crop height of 0.12 m, and an albedo of 0.23. The surface resistance for this hypothetical grass can be simplified to the following: The reference surface, as stated, is a hypothetical grass reference crop, well-watered grass of uniform height, actively growing and completely shading the ground, with an assumed crop height of 0.12 m, and an albedo of 0.23. For such reference crop the surface resistance is fixed to 70 s m-1 and implies a moderately dry soil surface resulting from about a weekly irrigation frequency.
When crop height is equal to 0.12 and wind/humidity measurements are taken at 2 meters height, then the aerodynamic resistance can be simplified as:  Mean annual precipitation (MA_Prec) values were obtained from the WorldClim v 2.1 58 , as averaged over the period 1970-2000, while ET 0 datasets estimated on a monthly average basis by the Global-ET 0 (i.e., modeled using the method described above) were aggregated to mean annual values (MA_ET 0 ). Using this formulation, AI values are unitless, increasing with more humid condition and decreasing with more arid conditions. As a general reference, a climate classification scheme for Aridity Index values provided by UNEP 64 provides an insight into the climatic significance of the range of moisture availability conditions described by the AI.

Data Records
The Reference Evapo-Transpiration (Global-ET 0 ) and Aridity Index (Global-AI) datasets included in the Global-AI_PET_v3 Database provide high-resolution (30 arc-seconds) global raster climate data for the 1970-2000 period, related to evapo-transpiration processes and rainfall deficit for potential vegetative growth, based upon implementation of a Penman-Monteith Reference Evapo-transpiration (ET 0 ) equation. Dataset files include the following geospatial raster datasets (distributed online in GEOTIFF format) covering the entire world: Global-ET 0 . Geospatial raster datasets are available as monthly averages (12 data layers, i.e., one layer for each month) or as an annual average (1 dataset) for the 1970-2000 period, plus the standard deviation of the annual average (1 dataset).
Global-AI. Geospatial raster datasets are available as monthly averages (12 data layers, i.e. one layer for each month) or as an annual average (1 data layer) for the 1970-2000 period.
The ET 0 geodataset values are defined as the total mm of ET 0 per month or per year. The AI values reported in the GeoTIFF (.tif) files have been multiplied by a factor of 10,000 to derive and distribute the data as integers (with 4 decimal accuracy). This multiplier has been used to increase the precision of the variable values without using decimals (real or floating values are less efficient in terms of computing time and space compared to integer values). The AI values in the GeoTIFF (.tif) files need to be multiplied by 0.0001 to retrieve the values in the correct units.
The geospatial dataset is in geographic coordinates; datum and spheroid are WGS84; spatial units are decimal degrees. The spatial resolution is 30 arc-seconds or 0.008333 degrees.
The ET 0 and AI dataset have been processed and finalized in GeoTIFF data format. These rasters have been zipped (.zip) into monthly series or individual annual layers available for online access at: https://doi. org/10.6084/m9.figshare.7504448.v5 46

Technical Validation
The global estimations of ET 0 and AI were first evaluated against the FAO "CLIMWAT 2.0 for CROPWAT" 47 ( Figs. 1 and 2) global database using long-term monthly mean values of climatic parameters derived from weather station data, roughly covering the period of 1970-2000, concurrent with the temporal coverage of the WorldClim version 2.0/2.1 database. CLIMWAT 2.0 provides observed agroclimatic data of over 5000 stations distributed worldwide (Fig. 3), including monthly averages for seven climatic parameters, namely maximum temperature, minimum temperature, relative humidity, wind speed, sunshine hours, radiation balance and ET 0 calculated according to the Penman-Monteith method, as well as the coordinates and altitude of the station.
Input parameters from the three WorldClim spatial datasets (versions: 1.4; 2.0; 2.1) were compared with the values extracted from the weather station data to evaluate the accuracy and overlap of the CLIMWAT and WorldClim datasets, and the suitability of using the CLIMWAT to evaluate the performance of the ET 0 spatial estimation, by sampling of the gridded data at the weather station coordinates. An assessment of the digital elevation data (DEM) provided by WorldClim 2.1, and used in our estimation, against that reported by CLIMWAT station data (Table 1; Fig. 4) showed a high level of accuracy (r 2 = 0.98), providing some confidence in the locational accuracy of the weather station data. The elevation data we used in this current analysis was virtually identical (r 2 = 1.00) to the DEM's used in previous versions of the Global-AI_PET databases. Likewise, a comparison of mean annual temperature data revealed no significant differences in these datasets (r 2 > 0.98 for all dataset comparisons), with the global average of each being nearly identical (≈ 17.8 °C) Fig. 5, indicating an absence of globally systematic bias towards over-or under-estimation of temperature. Annual precipitation as identified from the WorldClim 2.1 grids was also found to be highly correlated (r 2 = 0.96) with that reported by  www.nature.com/scientificdata www.nature.com/scientificdata/ the CLIMWAT weather station data (Table 1; Fig. 6), but with a moderately high stand error (148 mm), although more than WorldClim 1.4 (r 2 = 0.98), which covered a different temporal span . A comparison of the average global mean annual precipitation (MA_Prec) between the CLIMWAT and the WorldClim v. 2.1 data showed identical results (990 mm), with version 1.4 averaging slightly less (984 mm). As the input parameters from the WorldClim 2.1 showed high levels of accuracy in comparison to the CLIMWAT data, we concluded that the CLIMWAT was an appropriate dataset available for evaluating the accuracy of the ET 0 and AI estimation algorithms.
The calculation used to derive the ET 0 estimation was tested against the ET 0 estimates provided by CLIMWAT, using the CLIMWAT provided parameters from 4242 weather stations to parameterize the estimation algorithm (Table 1; Fig. 7). The calculated ET 0 was shown to be highly accurate (r 2 = 0.99) with a very low standard error (36 mm), providing confidence that the algorithm provides an accurate estimation. When the algorithm was implemented geospatially on a per grid cell basis to produce the Global_AI_PET_v3 dataset and tested against the CLIMWAT ET 0 estimates from 3842 weather stations, the results showed a relatively high level of accuracy (r 2 = 0.85), sufficient for use within many modeling and other scientific efforts. Local estimates, however, may have high variability associated with steep elevation gradients and heterogenous terrain, and/or low levels of accuracy at the grid cell level due to interpolation of scattered or less dense weather station data, as there is significant potential for error associated with the global input data.
Whereas the ET 0 based on the WorldClim 2.1 data was virtually identical to that produced by the WorldClim 2.0 (r 2 = 1.00, std error = 27 mm), differences were more significant when compared with the previous Global-AI_PET_v1 of the PET estimation (r 2 = 0.65). The ET 0 estimates based on the latest version of the WorldClim (v. 2.1) showed a significant improvement over the Modified Hargreaves PET estimates of the Global-AI_PET_v2 (r 2 = 0.85 vs r 2 = 0.72), using WorldClim v. 1.4, with the Hargreaves methodology systematically underestimating higher PET values. Similarly, the AI estimates based on the Global-AI_PET_v3 analysis, when compared to AI estimates based on parameters provided by the CLIMWAT weather station data (Table 1; Fig. 8), showed a high level of correspondence (r 2 = 0.90), statistically the same but nominally slightly less than from the Global-AI_PET_v1 estimates (r 2 = 0.91).
Similarly, the global estimations of ET 0 were evaluated against the calculated PET (ET 0 ) provided by the CRU_TS (Climatic Research Unit gridded Time Series version 4.05) 48 . The CRU_TS is a widely used climate dataset on a 0.5° latitude by 0.5° longitude grid over all land domains of the world except Antarctica. It is derived by the interpolation of monthly climate anomalies from extensive networks of weather station observations. PET values are provided in the CRU_TS dataset, calculated based upon the Penman-Monteith formula 25,26 , using the CRU_TS gridded values of mean temperature, vapour pressure, cloud cover and wind field. For our comparison, we averaged the CRU_TS monthly values for PET from 1971-2000 to obtain a global coverage of average annual PET for that time period. The same CLIMWAT meteorological stations used in the previous comparisons were used as sample points for the comparison with the latest version of the ET 0 dataset (based on WorldClim v 2.1), and the CLIMWAT ET 0 was also compared with the CRU_TS PET dataset (r 2 = 0.84) to assess general congruence among the datasets (Fig. 9). The CRU_TS precipitation data for that time period was similarly averaged and used to calculate an AI based upon the CRU_TS dataset and compared to the Global-AI_PET_v3. Results showed a high level of agreement for both the ET 0 and the AI comparison (r 2 = 0.89; r 2 = 0.83, respectively), considering the coarser resolution of the CRU_TS data is a likely source of error in the comparison with finer resolution data of the Global-AI_PET_v3.
Although we caution the users on the limitations of the data, we conclude with a high level of confidence that this revised ET 0 /AI dataset produced using our geospatially implemented algorithm based upon the FAO Penman-Monteith equation provides an adequate and usable global estimation of PET and AI suitable for a variety of non-mission critical applications, at scales from local, to national, regional, and global. Local topography,  www.nature.com/scientificdata www.nature.com/scientificdata/ landscape heterogeneity, and interpolation of weather station networks all contribute to increasing error at more specific levels, such as plot or field level, especially in areas where weather station density is sparse. However, based upon this technical evaluation, the authors concur that this current version (Global-AI_PET_v3) dataset is improved over previous versions, with a high correlation to real world weather station data, and as such, find it to be a valuable publicly available global public good, with comparative advantage as a reference resource, and global coverage at 30 arc-second resolution. Developed using the agreed upon standard methodology for estimation of ET 0 , based upon FAO-56 Penman-Monteith, this dataset (and its source code) represents a robust tool for a variety of scientific investigations in an era of rapidly changing climatic conditions.  www.nature.com/scientificdata www.nature.com/scientificdata/  www.nature.com/scientificdata www.nature.com/scientificdata/